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Abstract 

In this work, a goodness-of-fit test for the null hypothesis of a functional linear model with 
scalar response is proposed. The test is based on a generalization to the functional framework of 
a previous one, designed for the goodness-of-fit of regression models with multivariate covariates 
using random projections. The test statistic is easy to compute using geometrical and matrix 
arguments, and simple to calibrate in its distribution by a wild bootstrap on the residuals. The 
finite sample properties of the test are illustrated by a simulation study for several types of basis 
and under different alternatives. Finally, the test is applied to two datasets for checking the 
assumption of the functional linear model and a graphical tool is introduced. Supplementary 
materials are available online. 
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1 Introduction 

Functional data analysis has grown in popularity for the last years due to the increasingly data 
availability for continuous time processes. Typical examples of functional data include the temper- 
ature evolution, stock prices and path trajectories for objects in movement. New statistical methods 
have been developed to deal with the richer nature of functional data, being Ramsay and Silverman 
(2005), Ferraty and Vieu (2006) and Ferraty and Romain (2011) some of the main reference books 
in this area. 

In many situations, the functional data is related to a scalar variable. For this cases, it is interesting 
to assess the relation of the variables via a regression model, which can be used to predict the scalar 
response from the functional input. Analogue to the multivariate situation, the simplest functional 
regression model corresponds to the functional linear model with scalar response (see Ramsay and 
Silverman (2005) for a review). 

An interesting methodology approach to deal with functional data is the use of random projections. 
The objective is to characterize the behaviour of a functional process, which has infinite dimension, 
via the behaviour of the one dimensional inner products of the functional process with suitable 
random functions. This method has interesting applications for the goodness-of-fit of the distri- 
bution of the process, as it can be seen in Cuesta-Albertos et al. (2007). More recently, Patilea 
et al. (2012) provide a projection-based test for functional covariate effect in a functional regres- 
sion model with scalar response. In their paper, the authors adapt the tests of Zheng (1996) and 
Lavergne and Patilea (2008), based on smoothing techniques, to the context of functional covariates. 

In this work, a first goodness-of-fit test for the null hypothesis of the functional linear model, 
Hq : m G : /3 £ H}, being HI the Hilbert space of square integrable functions, is proposed. 
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The statistic test is of a Cramer-von Mises type and is based on a generalization of a previous test 
of Escanciano (2006), designed for the case of a regression model with multivariate covariates. The 
test statistic is easy to compute using geometrical arguments and simple to calibrate in its distri- 
bution by a wild bootstrap on the residuals. Further, although the test is given for the functional 
linear model, it can be extended to other functional models with scalar response, as it is based on 
the residuals of the model. 

This work is organized as follows. Some background on functional data, the functional linear model 
and the random projections paradigm are introduced in Section 2. The main part of this work 
is Section 3, where the theoretical arguments of the test, jointly with the bootstrap calibration 
procedure, are presented. The finite sample properties of the test are illustrated by a simulation 
study in Section 4. Section 5 illustrates the application of the test to two datasets and introduces 
a graphical tool to evaluate the goodness-of-fit of the functional linear model with scalar response. 
Final comments and conclusions are given in Section 6. An appendix attached contains omitted 
proofs, tables and figures. 

2 Background 

The main goal of this paper is to propose a goodness-of-fit test for the null hypothesis of the 
functional linear model with scalar response. Bearing in mind the different nature of the functional 
variables, some background on functional data, the functional linear model and the use of random 
projections is introduced. 

2.1 Functional data 

One of the first and most important problems when we deal with functional data is to choose a 
suitable functional space to work. The most used functional spaces are the metric, the Banach and 
the Hilbert spaces. This is a sequence of functional spaces with increasing richer structure, where 
the tools available for the former space are included in the latter. Specifically, in a metric space 
we can measure distances between functions; in addition, in a Banach space we can also measure 
the functions and Cauchy sequences are convergent; and finally, in a Hilbert space we have inner 
product, which allows to consider functional basis. 

While there are a lot of types of metrics and norm spaces, the LP spaces are one of the most 

used. The L p [0, 1] space, 1 < p < oo, is defined as the set of all functions / : [0, 1] — > R such 

i 

that their norm ||/|| p = ^ /J \f(t)\ p dtj v . is finite. The choice of the interval [0, 1] is done only to 
fix the integration limits and other intervals can be considered without major changes. The most 
important LP space corresponds to p = 2, because is the only which has an associated inner product 
(■, •) such that H/ ||p = (/, For two functions /, g G L 2 [0, 1], their inner product is defined as 

if, 9)= C f(t)g(t)dt. 
Jo 

In what follows we will consider as our working space the Hilbert space EI = L 2 [0, 1], bearing in 
mind that [0, 1] can be trivially replaced by another interval. The inner product allows for a basis 
representation of the elements of H and, given a functional basis {^ , j}°^ =1 of H, then any function 
X in H can be expressed by the linear combination X = YlJLi x j^ji where Xj = (X, Vl/j), j > 1. A 
basis is said to be orthogonal if ^j) = 0, i ^ j and orthonormal if, in addition, (fyj, tyj) = 1, 
j > 1. Typical examples of basis of EI are the Fourier basis {1, sin (2wjx) , cos (27rjx)}°^ =1 and the 
B-splines basis (see de Boor (2001)). 
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For the development of the test statistic, we will also need to introduce ap-truncated basis {^j}^ =1 , 
which corresponds to the first p elements of the infinite basis {*j}°l r The representation of X in 
this truncated basis is denoted by X^> = Y^=i x j^j- The choice of the number of basis elements p 
is crucial to have a reliable representation of the function X by X^ p \ Although there exists several 
methods to select an appropriate p, we will refer to the GCV criteria (see Ramsay and Silverman 
(2005), page 97) to select p and represent adequately the function X in {^i} p i=v This criteria will 
be used in Section 4.1 to select a suitable p for the case of the simple hypothesis. 

To deal with functional random projections we will need to define the functional analogue of 
the euclidean p-sphere W = {x € W : ||x||r p = 1}. In the functional case we have the func- 
tional sphere of H, defined as §e = {/ G E : 11/IIh = 1}, and the functional sphere of dimension 
p, which is the set of functions of H that, expressed in the p-truncated basis, have unit norm: 
S& = {/ = E?=i^eH:||/|| H = l}. 

The relationship between S p and is particularly interesting to develop the test. Let be * = 
((*i, the matrix of inner products of the ^-truncated basis, §^ = {x 6 F : x T \I/x = l} the 

p-ellipsoid generated by this matrix and R T R the Cholesky decomposition of \l/ (a semi-positive 
matrix). First of all, we have the trivial isomorphism that maps elements of to elements of by 
means of the functional coefficients: </>:/ = Ej=i x j^j ^ ^ <fi(f) = x £ Recall that func- 
tions (p an d 0" 1 are well defined because \\f\\^ = (Y^=i x j^j> Ej=i x j^j^j = xT ^ x - We must con- 
sider also a linear transformation from S p to S^, which is given by p : x G S p i— t- p(x) = R _1 x G 8% 
and whose Jacobian is |R| _1 , the determinant of the matrix R _1 . 

Using these two transformations, the integration of a functional operator T with respect to a 
functional covariate 

7 (p) i n 

§£, can be reduced to a real integration on the p-sphere: 

/, T (■■'") ^ - k KIH ds - = L |R|_1 Kg (R " g) ' * J ) i% - 

In the case where the basis is orthonormal, \I/ and R are the identity matrix of order p. Then the 
coefficients of 7 (p) G in the basis {^j}^ =1 belong to E> p without any transformation. 

2.2 Functional linear model 

Suppose that X is a functional random variable in H and Y is a real random variable. If both 
variables are centred, i.e., E [A'(i)] = for a.e. t G [0, 1] and E [Y] = 0, the Functional Linear Model 
(FLM) with scalar response claims for the following relation: 

Y=(X,p)+e = J X(t)0(t)dt + e, 

where the functional parameter /3 belongs to H and e is a random variable with zero mean, variance 
a 2 and such that E [A?(i)e] = 0, Vt. The prediction of Y is done with the conditional expectation of 
Y given X: 

m{X) =E[Y\X] = (X,P). 

Saying that (X, Y) share the functional linear model is equivalent to saying that the regression 
function of Y on X, m, belongs to the family M = {(-,/?) : P € H}. 
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Given a sample (X±,Yi), . . . , (X n , Y n ), the estimation of the functional parameter can be done by 
minimising the Residual Sum of Squares (RSS): 

n 

/^argmm^O^-^,/?)) 2 . 
i=i 

A possible method to search for the parameter (3 that minimises the RSS is representing the func- 
tional data and the functional parameter in the truncated functional basis {^j} p =\ an d {Qj} P j=\i 
respectively: 

PX Pp 

4 PX) = E ^ ^ M = E b i 9 i> * = 1, ■ • ■ , «■ 
j=i j"=i 



Using the vector notation X = (-"V )j, C = 
previous representation can be expressed as X = 
results in 

Y = {X,P)+e*s 



(cij)ij, tp = (^j)j, h = (bj)j and 6 = (6j)j, the 
Ciff and f}&P) = 6 T h. The functional linear model 

CJb + e = Zb + e, (1) 



where J = ((vPj, dj))^- Then, basis representation allows to express the FLM as a standard linear 
regression, where the estimated coefficients of j3 in the basis {Qj} P =\ are given by b = (Z T Z) _1 Z T Y. 
Although different combinations of {^fj} P Zi an d {&j} P Li are possible, the usual choice is {^?j} P =1 = 
{Oj} P =i, being {^!j} p =1 an orthogonal basis because in that case the matrix J is diagonal. 

There are several alternatives to represent the functional process and estimate the parameter f3 in 
a truncated basis. For instance, a general review of the estimation based on the use of basis expan- 
sions such as Fourier series or B-splines can be found in the book by Ramsay and Silverman (2005). 
The so called Functional Principal Component (FPC) regression estimation, proposed by Cardot 
et al. (1999), provide an orthogonal data-driven basis that gives the most rapidly convergent rep- 
resentation of the functional dataset predictor in a I? sense (see Hall and Horowitz (2007)). Preda 
and Saporta (2002) have proposed the Functional Partial Least Squares (FPLS) regression method 
that produces iteratively a sequence of orthogonal functions, as the FPC are, but with maximum 
predictive performance. To implement any of the methods shown before, it is required to fix the 
number of basis elements (or components) that are used in the estimation. 

The optimal number of components, p, has to be fixed based on the information provided by the 
data. To do this, Hall and Hosseini-Nasab (2006) and Preda and Saporta (2002) use the predictive 
cross-validation criterion (PCV), Cardot et al. (2003) and Ferraty and Romain (2011) consider the 
generalized cross-validation criterion (GCV) and Chiou and Miiller (2007) and Febrero-Bande et 
al. (2010) consider those methods based on the AIC, AICc and BIC information approaches. 

Let denote by = (^X,j^ p \ (3^ p ^ and Y^ p \^ = (^X^ p \ (3^^ the prediction of Y{ using p components 
with the whole sample and with the whole sample excluding the i— th element, respectively. The 
PCV is defined as: 

1 n 2 

PCV(p) = argmin- £ (V ^ } ) , 

i=l 

which is computationally expensive because it involves the estimation of the n times. This 

is especially expensive in the case of data-driven basis (FPC, FPLS) because the basis has to be 
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recalculate for every datum. As an alternative, GCV avoids recalculating the for every datum 
by introducing a penalty term. The GCV is defined as 




) 



2 



GCV(p) = argmin 

v 



(2) 



where df is the number of degrees of freedom consumed by the model, typically given by the trace 
of the matrix Z. GCV is closely related with AIC, AICc and BIC although they come from different 
perspectives. 

2.3 Random projections 

Random projections are becoming quite popular when dealing with high dimensional data, as a way 
to overcome the well known curse of the dimensionality. The main idea behind is to reduce the di- 
mension, and characterize the original distribution of the multidimensional data by the distribution 
of the randomly projected univariate data. 

In the goodness-of-fit field, this is specially interesting, as the test procedures tend to become less 
efficient, less powerful, when the dimension of the model increases. Escanciano (2006) used this 
technique to develop a goodness-of-fit test for multivariate regression models based on random 
projections. According to his simulation study, the test has an excellent power performance and 
has the best empirical power for most situations when comparing to their competitors. 

In the functional framework, it is also possible to consider random projections. Usually, this is 
achieved by considering the inner product of the functional variable X of H and a suitable family of 
projectors, i.e. random functions 7 in H. For example, using with this approach Cuesta-Albertos et 
al. (2007) developed some goodness-of-fit tests for parametric families of functional distributions, 
which includes goodness-of-fit tests for Gaussianity and for the Black-Scholes model. 

A very interesting result on projections can be found in Patilea et al. (2012). In their paper, the 
authors provide a characterization of the conditional expectation of a scalar variable Y with respect 
to a functional variable X given in terms of the conditional expectation of Y with respect to the 
projected X. The result is stated here in the following lemma. 

Lemma 1 (Patilea et al. (2012)). Let Y be a random variable and X a functional random variable 
in the functional space H. The following statements are equivalent: 

I. E [Y\X = x] = 0, for almost every (a.e.) x G H. 

//. E [Y\ (X, 7) = u] = 0, for a.e. u£R and V7 G § H . 

III. E [Y\ (X, 7) = u] = 0, for a.e. a£K and V7 e S|, Vp > 1. 

3 The test 

The presentation of the goodness-of-fit test that we propose in this paper is divided into three 
sections. The first and most important presents the theoretical fundamentals of the test, with 
starting point in Lemma 2, which proof is detailed in the appendix. The second derives the effective 
implementation of the test statistic in practise considering some geometrical and matrix arguments. 
Finally, the bootstrap resampling for the calibration of the test statistic is presented in the last 
section. 
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3.1 Theoretical arguments 

Let Y be a real random variable and X a functional random variable in the space H. Given a 
random sample {(Xi, Y)}™ =1 , we are interested in checking if a functional linear model is suitable 
to explain the relation between the functional covariate and the scalar response, i.e., test for the 
composite hypothesis: 

H :m€{(;p) : /3 G H} , 

versus a general alternative of the form Hi : P {m £ {(■,/?) : P G H}} > 0. Further, the simple 
hypothesis, i.e. checking for a specific functional linear model: 

H : m (X) = (X, p ) , for a fixed /3 G H, 

is also of interest as it includes the important case of no interaction between the functional covariate 
and the scalar response (considering Po(t) = 0, Vi). In what follows we will focus on the procedure 
for the composite hypothesis, given that the simple is obtained just considering that the functional 
parameter is known and substituting ft and P^ by Po an d Po^ '> respectively. 

The key point to test the null hypothesis Hq is the following lemma, an adaptation of the Lemma 
1 to our setting, which gives the characterization of Ho in terms of the random projections of X. 

Lemma 2. Let (3 be an element ofW. The following statements are equivalent: 

I. m(X) = (X,P), \/X G H. 

//. E [Y — (X, p) \X = x] = 0, for a.e. x G H. 

///. E [Y - (X, P) | (X, 7) = u] = 0, for a.e. fi£l and V 7 G S H . 

IV. E[Y - {X,p)\{X,-/} = u} = 0, for a.e. net and V 7 GS^ ; Vp > 1. 

V. E [(y - /3))l {<A > i7> < u} ] = 0, /or a.e. m£R and V 7 G S H - 

VI. E [(Y - P))t{(x,^<u}] = 0, /or a.e. m£R and V 7 G Vp > 1. 

Then Ho is characterized by the null value of the moment E [(Y — (X, P))t^x,j)<u}] j f° r a - e - n G M 
and V 7 G §h ( or V 7 G §j^, Vp > 1) and a possible way to measure the deviation of the data from 
Ho is by the empirical process arising from the estimation of this moment: 

n 

R n (u, 7 ) = n-5 (Y - (Xij)) l { ^ il7 ><«}, (3) 
i=i 

that will be denoted as the Residual Marked empirical Process based on Projections (RMPP). The 
marks of (3) are given by the residuals {Y — (Xi, P)}" =1 and the jumps by the projected functional 
regressor in the direction 7 , {(Xi, 7 )}" =1 . The estimation of P can be done by different methods as 
described in Section 2. Note that the RMPP only depends on the residuals of the model considered 
(in this case the residuals of the FLM) and therefore it can be easily extended to other regression 
models (see Section 6 for discussion). 

To measure the distance of the empirical process (3) from zero, two possibilities are the classical 
Cramer-von Mises and Kolmogorov-Smirnov norms, adapted to the projected space II = R x §h: 

PCvM n = f i? n (n, 7 ) 2 F„ i7 (dn)w(d 7 ), (4) 
Ju 

PKS n = sup 1^(^7)1, (5) 
0, 7 )en 
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where F na is the empirical cumulative distribution function (ecdf ) of the projected functional data 
in the direction 7 (i.e. the ecdf of the data 7) }™ =1 ) and uj represents a measure on §h- Un- 
fortunately, the infinite dimension of the space §e makes infeasible to compute the functionals (4) 
and (5) and some kind of discretization is needed. A solution to this problem is to consider the 
properties of the Hilbert space H and use a basis representation. 

Up to this end, let us introduce some notation. Let { x &j}°° =1 be a basis of H and consider the 

p-truncated basis {^_j}j =1 , with matrix of inner products Denote by X^ and 7^ the repre- 
sentation of the functions X^ and 7 in the p-truncated basis, with vectors of coefficients Xj jP and 
gp, respectively, and for i = 1, . . . , n. Using this, as {^j}°^ =1 is any basis, we have that 

(x^,^)=^ p * gp . 

By analogy with the previously defined F n>7 , we will denote F n ( p ) to the ecdf of the projected 
functional data expressed in the p-truncated basis, both for the projector 7 and for the functional 
data. Then, the RMPP can be expressed in terms of a p-truncated basis, yielding 

n 

Rn, P (n, 7 (p) ) = n' 1 * J2 i Y i ~ X £ P * b p) 1 {xf p *g p <«} = Rn, P (u,g P ) , 
1=1 

where b p represents the coefficients of /3 in the p-truncated basis {^j}^ =1 - 

Bearing in mind this, our test statistic propose is a modified version of (4) that results from ex- 
pressing all the functions in a p-truncated basis of H: 

PCvM„, p = / R n , p (n,7 (p) ) 2 F n (p) (du)uj(d^). (6) 

We have decided to choose the Cramer-von Mises statistic because, as we will see, presents impor- 
tant computational advantages and can be adapted to the given framework of Escanciano (2006) 
for the finite dimensional case. The most important advantage is that we can derive an explicit 
expression where there is no need to compute the RMPP for different projections, property that 
does not hold for the Kolmogorov-Smirnov statistic. 

Using that the integration in the p-sphere of EI can be expressed as the integration in the p-sphere 
of W via the transformations defined in Section 2.1, we have: 

PCvM„ iP = / Rn, P {u, gp) 2 F n:gp (du) L0(dgp) 

= / (Rp 1 i? n , p (u, R _1 g p ) 2 F n)R -i Sp (du) u(dg p ) 

JSPxR 

= [ IRT 1 (n- 1 * jZ {Yi -x£,*bp) 1 {x t R T gp<M |) F n ^- lsp (du)u(dg p ), (7) 

where uj now represents a measure in the p-sphere S p that, for simplicity purposes, will be consid- 
ered as the uniform distribution on S p . 

Essentially, what we have done is to treat the functional process as a p-multivariate process, ex- 
pressing the functions in a basis of p elements. The methods to choose the number of elements p 
and to estimate the parameter j3 both for the simple and for the composite hypothesis are the ones 
introduced in Section 2. These methods will be illustrated in Section 4. 
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3.2 Implementation 

Following the steps of Escanciano (2006) it is possible to derive a simpler expression for (7). 
Using the definition of the RMPP in a p-truncated basis, the fact that F n>R -i g is the ecdf of 
{x?p*R _1 gp}™ =1 = |x?pR T gp}™ =1 and some simple algebra, we have: 



PCvM njP = / |R| 1 R n4 ,(u,R 1 g p ) 2 F njR -i (du)dgp 

n n „ 

= n ^ e e &j y §pxM i r i _i i {^<«.} i k^«p<u} 

i — 1 j — 1 



(iiit) dg p 



j 

n n n 

= n 2 E E Zi^jAijj 
i=l j=l r=l 



with fj = Yi — {X^ P \ fi^y The terms A^ T represent the integrals 

^ = L 1 1 H P RT Sp<< p ^ T Sp} 1 {^, p RT S P <^ T Sp} dg P 

= |R| 1 l{(Rx ljP -Rx r . :p ) T g p <0,(Rx J . : p-Rx r , p ) T g p <0} dgp 
J§ P 

=iRr 1 / d gp , 

J Sij r 



where S ljr = G : f < I (x' ip - x' r>p , ^ < f , f < I (x! jp - x' r>p , < f } and I (a, b) rep- 

resents the angle between vectors a and b. To simplify notation, we denote x' k = Rx^ p (x^, = Xfc jP 
if the basis is orthonormal) for k = 1, . . . ,n. Depending on x.' ip , x^ p , x' rp , the region Sij r can be 
the whole sphere S p (x' ip = x^ p = x^ p ), a hemisphere of S p (x! ip = x^ p , x^ p = x£, p or x^ p = x^ p ) 
or a spherical wedge (see Figure Bl in appendix) of width angle given by 



7r — arccos 



V t,p -^r,pJ V j,p -^r,pJ 



I "X" ' — "X"' 11*11 "X"' — "X"' 

I i,p r,f> 1 1 \\ j,p A r,pl 



(8) 



Thus Ajj> is the product of the surface area of a spherical wedge of angle A^ r times |R| 1 , and is 
given by 



A .. _ A W P/2 1 IR.I-1 A W 



( 2ir x' — x' — x' 

7r ' X j,p ~~ X J,P' X «,P ~~ X r,p ° r X j,P — X r,p> 



[ (8), else. 



We also have a symmetric property, Aij r = Aji r , which simplifies the evaluation of the test statistic 
from 0(n 3 ) to O ((n 3 + n 2 )/2) computations. The memory requirement is expensive, because we 
need to store the (n 3 + n 2 )/2 elements of the three dimensional array A, which is symmetric in its 
two first indexes. However, this requirement can be stretched if we consider the following expression 
for the statistic: 

PCvM„ iP = n- 2 e T A.e, (9) 

where A, = (Y^r=i ^i r \j is a n x n matrix and e is the vector of the residuals. By the definition 
of A^ r and its symmetry in the first two entries, the matrix A, is symmetric and its diagonal 
terms are given by (n + Although the order of computations remains similar, O ((n 3 — n 2 )/2), 
the memory required for storing the matrix A, is substantially lower and drops to (n 2 — n + 2)/2 
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elements. This fact improves drastically the time of computation of the statistic and allows to apply 
the test to larger datasets. 



Again, let us remark that the expression derived for the PCvM„ iP statistic remains valid for any 
functional regression model with scalar response and not just for the FLM, as the expression is 
based on the residuals of the model. 

3.3 Bootstrap resampling 

To calibrate the distribution of the statistic PCvM„ jP under the null hypothesis, a wild bootstrap 
on the residuals is applied. This bootstrap procedure is consistent in the finite dimensional case, as 
it was shown in Stute et al. (1998), and is adequate to situations with potential heterocedasticity, 
quite common in functional data. The resampling process for the case of the composite hypothesis, 
given an initial estimation of the functional parameter, is the following: 

I. Construct the estimated residuals: e« = Yi — ^X^ p \j3^ p ^ , i = 1, . . . , n. 

II. Draw independent random variables V*, . . . , V* satisfying E* [V*] = and E* [V* 2 ] = 1. For 
example, if V* is a discrete random variable with distribution weights P jV* = — ^ j = 5+ l0 ^ 

P jV* = 1+ 2 V ^ | = 5 ^^ , we have the golden section bootstrap. 

III. Construct the bootstrap residuals e* = V*ii, i = 1, . . . ,n. 

IV. Set Y* = (x t ip) J^) + e*, i = 1, . . . , n and estimate (3*>^ for the sample {{X h Y*)}™ =1 . 



V. Obtain the estimated bootstrap residuals £* = Y* - (X^, /3*'^ ) , i = 1, . . . , n. 



Then, the procedure to calibrate the test is the following. In step I we compute the test statistic 
with the residuals under Ho using the implementation (9) of the previous section. Then repeat steps 
II-V for b = 1, . . . , B, computing each time the bootstrap statistic PCvM*'^ = n~ 2 £* ,b,T A,e* ,b and 
estimate the p-value of the test by Monte Carlo: # {PCvM njP < PCvM*'^} /B. For computational 
efficiency, it is important to note that we do not have to compute again the matrix A, in the 
bootstrap replicates. 

A very interesting fact of the FLM is that step V can be easily performed using the properties of the 
estimation of /3^. From (1) it is clear that the vector of coefficients of j3&) is estimated throughout 
b = (Z T Z) Z Y. Then, the estimated bootstrap residuals, represented by the vector i* , can be 
obtained as e* = (l p — Z (Z T Z) 1 Z T ^ Y*, where Y* is the vector of bootstrap responses given 

in step IV and I p is the identity matrix of order p. The projection matrix — Z (Z T Z) 1 Z 7 ^ 
remains the same for all the bootstrap replicates, so it can be stored without the need of computing 
it again. Obtaining the residuals in this way implies a significative computational saving. 

The bootstrap resampling in the case of the simple hypothesis is easier: just replace (3^ by (5^ 
and omit steps IV and V, considering i* = e* , i = 1, . . . ,n. 



4 Simulation study 

To illustrate the finite sample properties of the proposed test, a simulation study was carried out for 
the simple and the composite hypotheses. The functional process considered for the functional co- 
variate X is an Ornstein-Uhlenbeck process in [0, 1], which corresponds to a Brownian motion with 
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functional mean \i and covariance function given by Cav(X(s), X(t)) = ^ge~ e ^ s+t ^ ( e 26,mm ( s >t) _ i). 
We have considered 6 = |, a = 1 and the functional mean //(i) = 0, \/t £ [0, 1]. See Figure B2 in 
appendix for further details. 

All the functional data in this simulation study is represented in 201 equidistant points in the in- 
terval [0, 1]. The number of bootstrap replicates considered is B = 1000 and the number of Monte 
Carlo replicates for determining the empirical sizes and powers, M = 1000. The sample size, except 
otherwise stated, is n = 100. Lastly, in order to properly compare the effect of the kind of basis, 
the number of elements and the sample sizes, the initial seed for the random generation of the 
functional underlying process is the same for each model. 

Several lengthy tables have been reduced in this section for space saving. The reader is referred to 
the appendix to see the whole tables as well as other explanatory figures. 



4.1 Testing for simple hypothesis 

The simulation study for the simple hypothesis is centred on the case Hq : m(X) = (X,/3o), where 
(3o(t) = 0, t £ [0, 1]. This is equivalent to test that the functional covariate X has no effect on the 
scalar response, i.e., test the null hypothesis Hq : m{X) = 0. Although there is an extensive collec- 
tion of goodness-of-fit tests for finite dimensional covariates (see Gonzalez-Manteiga and Crujeiras 
(2011)), the literature for the case of functional covariates is more limited. Therefore, we will focus 
on the competing procedures of Delsol et al. (2011) and Gonzalez-Manteiga et al. (2012) to compare 
the different tests in terms of level and power. Let us describe briefly these two test statistics. 

Delsol et al. (2011) propose a test statistic for Hq : m{X) = mo(X), deriving its asymptotic law 
and giving a bootstrap procedure based on the residuals. The statistic, inspired in the propose of 
Hardle and Mammen (1993), is 




" V A ( d{X h Xl ' ) ) MX)dP x (X). 



where K is a kernel function, d is a semimetric and h is the bandwidth parameter. Px represents 
the probability distribution of the functional process and a; is a suitable weight function. The test 
used in our implementation results from considering no functional effect, i.e. Hq : mo(X) = 0, and 
from approximating the integral with respect to dPx by the empirical mean of the sample. We have 
also considered the kernel K{t) = 2(p(\t\), being <f> the density of a Af(0, 1), the I? distance in 

H for d and the uniform weight function. The bandwidth parameter is given by the PCV criterion 
and bootstrap resampling was done using golden wild bootstrap. 

The other competing test is the one proposed by Gonzalez-Manteiga et al. (2012) and is based on 
the idea of extending the covariance to functional-scalar data: 



D n 



where X is the functional mean of {Afj}™ =1 and is Y the usual scalar mean of {Y;}™ =1 . The authors 
extend the ideas of the classical F— test to the functional framework, resulting a statistic to test the 
null hypothesis of no interaction inside the functional linear model. The test is consistent and the 
authors derived the asymptotic distribution of the process ^ Y17=i — ^) (^» — ^) ' res ulting in 
a Brownian motion with mean E [(X — ^x){X — My)] and a particular covariance structure. This 
test can be viewed as a possible benchmark in our simulation study and, recalling its similarity with 
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the classical F-test, will be denoted as the functional F-test. The bootstrap resampling was also 
performed using golden wild bootstrap. 

Three different blocks of deviations from the null are considered. The first two blocks represent 
a deviation inside the linear model, i.e., considering different functions fy^, j = 1,2, k = 1,2,3, 
instead of The linear functions are /3i,fc(t) = 7fc • (i — 0.5), with coefficients 71 = 0.25, 72 = 0.65 
and 73 = 1.00 for H ljk and /3 2)fe (i) = m ' sin(2vrt 3 ) 3 , with 771 = 0.10, rj 2 = 0.20 and 773 = 0.50 for 
i?2,fc- The second block of deviations from the null hypothesis consists on adding a second order 
term (X, X) to the regression function, thus the model is no longer linear. Different weights for the 
second term are represented in the alternatives : Y = (X, /3q) + 5 k (X, X) + e, where 5± = 0.005, 
82 = 0.010 and £3 = 0.015. The relation between the variance of the response with respect to the 
variance of the error can be measured by the signal-to-noise ratio: snr = a 2 / (a 2 + E [m(Af) 2 ]). 
For block 1 the snr's of the alternatives are 0.956, 0.765 and 0.579, respectively for Hi t k, k = 1,2, 3. 
For block 2, the snr's are 0.981, 0.850 and 0.671. For block 3, we have 0.985, 0.914 and 0.728. 

In the case of the simple hypothesis there is no estimation of the parameter (3q, as it is known. 
However, it is necessary to express the functional process p and the function /3o in a suitable basis 
in order to compute the test statistic. Up to this end, we consider a B-splines basis and we choose 
automatically its number of elements by the GCV criteria commented in Section 2.1. 

The results of the study for the simple hypothesis are collected in Table 1, which shows the empirical 
sizes and powers of the functional F-test, the test of Delsol et al. and the PCvM test for simple 
hypothesis, for the models previously commented. All of the tests seem to calibrate the significance 
level a = 0.05. With respect to the power, the functional F-test has in average a superior behaviour 
in the alternatives #2,fc, which represents deviations from the null inside a linear model. The test 
of Delsol et al. performs also well with the cross-validatory bandwidth, being the most competitive 
for the block H\ k . The PCvM test performs worse than the functional F-test for alternatives #1^ 
and -ff 2 ,fc and similarly to the test of Delsol et al.. Nevertheless, for alternatives that are not inside 
the linear model, the PCvM test results the most powerful. Similar results are obtained with a 
noise given by a recentred exponential distribution with parameter A = 10. 



Models 


F-test 


PCvM 


Delsol et al. 


F-test 


PCvM 


Delsol et al. 


#0 


0.060 


0.041 


0.065 


0.043 


0.051 


0.066 


Hi,i 


0.060 


0.069 


0.098 


0.056 


0.052 


0.072 


#1,2 


0.163 


0.078 


0.309 


0.180 


0.085 


0.285 


#1,3 


0.401 


0.138 


0.772 


0.442 


0.166 


0.719 


#2,1 


0.248 


0.053 


0.080 


0.265 


0.071 


0.089 


#2,2 


0.951 


0.336 


0.403 


0.932 


0.343 


0.420 


#2,3 


1.000 


0.904 


0.877 


0.999 


0.901 


0.848 


#3,1 


0.034 


0.173 


0.165 


0.052 


0.125 


0.128 


#3,2 


0.038 


0.691 


0.554 


0.034 


0.721 


0.558 


#3,3 


0.019 


0.998 


0.932 


0.012 


1.000 


0.967 



Table 1: Empirical power of the competing tests for the simple hypothesis H : m(X) = (X, (3 ), f3 (t) — 0, Vt 
and significance level a = 0.05. Noise follows a Af(0, 0.10 2 ) and a recentred Exp(10). 



11 



4.2 Testing for composite hypothesis 



To see the performance of the test under the composite hypothesis Hq : m G {(•,/?) : /3 G H} we 
have considered three different null models of the form 



with j = 1,2,3 being the index of the three different models. The functional coefficients of the three 
FLM are /?i(t) = sin(2vrt) - cos(2vrt), fa{t) = t - (t - 0.75) 2 and /3 3 (t) = t + cos(2vrt), t G [0, 1]. 
The second functional coefficient is chosen to be perfectly described by B-splines, whereas this is 
not the case for j3i and fa. 

In order to check the power performance of the test, a set of possible deviations from the linear 
regression model is considered. Again, a second order term {X, X) is introduced to transform the 
model into a non-linear one. Three different weights for this term are considered, representing the 
alternatives Hj^: 



The index for the model is denoted by j = 1,2,3 and k = 1,2,3 is the index that measures the 
degree of the deviation from the null hypothesis. The weights of the quadratic term are Si = 0.01, 
S 2 = 0.05 and <5 3 = 0.10. The snr's for model 1 are 0.177, 0.176, 0.166 and 0.140, respectively for 
Hi yk , k = 0, 1,2,3. For model 2, 0.050, 0.050, 0.050 and 0.047. For model 3, we have 0.029, 0.029, 
0.029 and 0.028. 

Three estimation methods for the functional parameter (3 will be considered. All of them are de- 
signed in order to provide automatic selectors of the number of elements considered in the basis 
estimation of (3. So, the first automatic method considered is the estimation of j3 as a linear com- 
bination of a B-splines basis of p elements, where p is chosen by the GCV criteria (2). Secondly, 
FPC estimation relies on the BIC criteria to choose the optimal number of elements in the FPC 
basis derived from the process to estimate /3. Finally, the FPLS method also uses PCV to select 
the adequate number of elements in the FPLS basis derived from the joint sample {(Xi, 3^)}" =1 . 

Table 2 shows the rejection frequencies of the null hypothesis for the test computed from obser- 
vations of the null hypotheses (10) and deviations (11), for the significance level a = 0.05. The 
rejection rates were computed for the three types of estimation of the functional coefficient and 
basis representation, in order to see the possible effects of the estimation method in the power 
performance. At sight of the rejection frequencies for the three models, several comments must be 
done. Firstly, the test respects the significance levels for the null hypothesis for the three estimation 
methods considered. Secondly, there seems to be no big differences in terms of power for the three 
methods, although it can be observed that the FPC and FPLS estimation methods are slightly more 
conservative. Finally, at sight of the similarities between the response under the null and under the 
alternatives (see Figure B4 in appendix), the results of Table 2 point toward a quite competitive 
test. Similar results are obtained with a non symmetric random noise. 

The behaviour of the test for different sample sizes is shown in Table 3. As in the previous tables, the 
three estimating methods have very similar rejection ratios and we can see that B-splines estimation 
has again larger rejection ratios for all the models. As expected, when the sample sizes increases, 
the rejection rates also do. 



H j>0 : Y = (X, Pj) + e, 



(10) 



H hk : Y = (X,(3 j ) + 5 k {X,X)+s. 



(11) 
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Models 


B-splines 


FPC 


FPLS 


B-splines 


FPC 


FPLS 


#1,0 


0.061 


0.052 


0.059 


0.039 


0.046 


0.046 


#1,1 


0.094 


0.082 


0.078 


0.074 


0.072 


0.077 


#1,2 


0.747 


0.732 


0.715 


0.737 


0.721 


0.720 


#1,3 


0.997 


0.997 


0.996 


0.996 


0.997 


0.996 


#2,0 


0.058 


0.045 


0.050 


0.041 


0.035 


0.033 


#2,1 


0.086 


0.071 


0.074 


0.081 


0.080 


0.078 


#2,2 


0.745 


0.722 


0.720 


0.743 


0.724 


0.718 


#2,3 


0.997 


0.996 


0.997 


0.994 


0.995 


0.994 


#3,0 


0.054 


0.046 


0.044 


0.052 


0.040 


0.038 


#3,1 


0.082 


0.077 


0.075 


0.072 


0.062 


0.062 


#3,2 


0.764 


0.752 


0.750 


0.735 


0.737 


0.721 


#3,3 


0.999 


0.998 


0.998 


0.998 


0.998 


0.997 



Table 2: Empirical power of the PCvM test for the composite hypothesis Hq : m e {(•, 0) : (3 <E H} and for 
three estimating methods of (3 at significance level a = 0.05 with noise 7V(0, 0.10 2 ) (first three columns) and 
recentred Exp(0.10) (last three). 



Method 


#1,0 

50 100 200 


#1,1 

50 100 200 


#1,2 

50 100 200 


#1,3 

50 100 200 


B-spline 
FPC 
FPLS 


0.076 0.061 0.062 
0.059 0.052 0.059 
0.062 0.059 0.058 


0.093 0.094 0.121 
0.064 0.082 0.123 
0.069 0.078 0.115 


0.484 0.747 0.966 
0.442 0.732 0.963 
0.414 0.715 0.961 


0.900 0.997 1.000 
0.893 0.997 1.000 
0.873 0.996 1.000 



Table 3: Empirical power of the PCvM test for the composite hypothesis H Q : m e {(-,/?) : /3 <E H} and for 
different sample sizes n. Noise is a J\f(0, 0.10 2 ). 



5 Data application and graphical tool 

The Tecator dataset is a well known dataset in the literature of functional data analysis (see, for 
example, Ferraty and Vieu (2006)). It contains data from 215 meat samples, consisting of a 100 
channel spectrum of absorbances measured by a spectrometer and the contents of water, fat and 
protein. When trying to explain the content of fat in the meat samples throughout the spectro- 
metric curves, it is common to transform the original curves into the first derivatives or the second 
derivatives, in order to properly capture the wavy effects of the meat samples with high percentage 
of fat (see the left plot of Figure 1). 

We have applied our goodness-of-fft test with B = 5000 bootstrap replicates for the original dataset 
and for the dataset of the first and second derivatives. The p-values obtained are 0.004, 0.000 and 
0.000, respectively. Thus we have significative evidences against the null hypothesis of FLM. The 
test was applied with the FPLS estimation method and with automatic selection of the number of 
FPLS by PCV. As the case of no interaction is a particular case of a FLM, we can conclude that 
in the Tecator dataset there exists a significative dependence between the functional covariate and 
the scalar response, although this dependence is not a linear one. 

The other dataset considered is the AEMET dataset, which is available in the R package f da. use 
(see Febrero-Bande and Oviedo de la Fuente (2012)). It is formed by the daily summaries of 73 
Spanish weather stations during the period 1980-2009. Among others, the functional covariate is 
the daily temperature in each weather station, and the scalar response is the daily wind speed (both 
variables are averaged over 1980-2009). The center plot of Figure 1 represents the functional obser- 
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vations of the daily temperature. Before applying the tests, four functional outliers corresponding 
to the 5% less depth curves according to the Fraiman and Muniz (2001) depth were removed. 




Figure 1: From left to right: Tecator dataset with spectrometric curves coloured according to their content of 
fat (red for larger and blue for lower); AEMET temperatures for the 73 Spanish weather stations; estimated 
functional coefficient by the FPLS method for the AEMET dataset. 

The resulting p-value from the goodness-of-fit test is 0.121, thus there is no significative evidences 
to reject the null hypothesis of the FLM for the AEMET dataset. The test is applied with the FPLS 
estimation method and with B = 5000 bootstrap replicates. The right plot of Figure 1 shows the 
estimated functional parameter f3, resulting from a basis of 2 FPLS. Once we have determined that 
the FLM is a suitable model, we can check if the estimated coefficient /3 is significantly different 
from zero with the available tests for the simple hypothesis: the functional F-test, the test of Delsol 
et al. (with PCV bandwidth) and our test for the simple null hypothesis of no interaction. The 
p-values obtained are: 0.002, 0.000 and 0.000, respectively. All the tests reject the null, so we can 
conclude that the curves of the temperature and the average wind speed show a non-trivial linear 
relation. 



We conclude this section showing a graphical tool to visualize the goodness-of-fit of the FLM to a 
dataset that can be useful to practitioners. The key idea is to compare graphically the process (3) 
obtained with the residuals of the fitted model with the processes obtained with the bootstrapped 
residuals under the null hypothesis. The path of the RMPP depends on the random projections 7 
and therefore it is difficult to compare two trajectories of the process. However, integrating with 
respect to 7 results a process that does not depend on the projections. Further, this integration is 
easily approximated by Monte Carlo: 

f 1 G 

R n (u) = R n (u, 7) Uj(d-f) « — S~] Rn{u, Jg), 

being 7^ functions in §e and G the number of Monte Carlo replicates. For 7^, a possibility is 
to consider stationary Gaussian processes with unit norm. Figure 2 shows the comparison of the 
observed process R n and B = 100 bootstrapped processes under the null, for the two studied 
datasets. Consistently with the obtained p-values, the observed processes for the Tecator dataset 
seem to be significantly different, whereas for the AEMET dataset the observed process is just an 
ordinary trajectory of the bootstrapped ones. 



6 Conclusions 

We have presented a goodness-of-fit test for the null hypothesis of the functional linear model. 
The test is constructed adapting the propose of Escanciano (2006) to the functional scheme using 
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Figure 2: R n process observed (solid line) and B = 100 generated process under the null hypothesis 
H :m£ {(-,/?) : (3 e H} (dashed lines), for the Tccator datasct (left) and the AEMET dataset (right). The 
number of Monte Carlo replicates for the projections is G — 200. 

a basis representation. Different estimation methods for the functional parameter were considered, 
showing in general a similar behaviour in the performance of the test. The simulation study shows 
that the test behaves well in practise: respects the significance level and has good power. The test 
was applied to two real datasets to determine if the FLM was plausible, rejecting the null hypothesis 
for the first and finding no evidences for rejecting in the second. 

The asymptotic distribution of the statistics PCvM n and PCvM njP , quadratic functionals of the 
processes R n and R ntP , respectively, is an open problem. The convergence of both processes re- 
mains as a problem of great relevance to be considered in the future, taking into account that these 
processes are indexed in R x M and that it does not exist, up to our knowledge, any results of weak 
functional convergence of empirical processes indexed in infinite dimensional spaces. 

Although in this paper we have focused on the functional linear model, the proposed test can be 
extended to checking for any other regression model with functional covariate and scalar response. 
As the statistic is based on the residuals, the practical implementation and the wild bootstrap 
calibration given in Section 3 will remain the same: we just have to consider suitable estimators 
for the parameters of the regression model to compute the residuals. Therefore, obvious extensions 
could be the testing of FLM with several covariates or the testing of the quadratic functional model. 

Finally, let us remark that the code for the implementation of the goodness-of-fit test in the simple 
and composite cases is available throughout the function f lm.test of the R library f da. use since 
version 0.9.8. This function also shows the graphical tool introduced in Section 5. To speed up the 
computation of the test statistic, the critical parts of the test implementation have been programmed 
in FORTRAN. 
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Appendix to "A goodness-of-fit test for the functional linear model 

with scalar response" 



Eduardo Garcia-Portugues, Wenceslao Gonzalez-Manteiga and Manuel Febrero-Bande 
A Proof of Lemma 2 

Proof of Lemma 2. Let f3 be an element of HI. We will proceed by proving equivalences by pairs. 

First of all, equivalence of I and II is immediately by the definition of m{x) = E \Y\X = x\. Equiv- 
alences of II, III and IV follow by Lemma 1. 

The equivalence of III and V is based on the definition of the integrated regression function and is 
given by a chain of equivalences. Let denote ?7 7 = (X, 7), for any 7 G §e, m 7 («) = E [V|[/ 7 = u] 
and mo i7 (u) = E[(Af,/3) \ U y = u]. The integrated regression functions for m 7 and mo i7 are given 
by: 

I 1 (u)=E[Yl {Ui < u} ] =E[E [yi {tty <„ } |l7 7 ]] =E[E[y|tf 7 ]l {Kr < tt} ] 

/OO f'X 
m 1 (u)t {u < x }dF~ i {u)= I m 7 (u)cZF 7 (u), (12) 
-00 J — 00 

7 0l7 («) = E [(^, 0) t {Uj < u} ] = E [E [(^, /?) 1 {{ / 7 < u} |C/ 7 ]] = E [E (3) |C/ 7 ] l {I , 7 < tt} ] 

/oo /'a; 
ma, 1 {u)t {u < x} dF 1 (u)= I m , 7 (u)dF 7 (u), (13) 
-00 J —00 

where F 7 represents the distribution function of E/ 7 . Statement III can be expressed as 

m 7 (u) = mo, 7 (u), for a.e. m£1, 

which by (12) and (13) is equivalent to 

L/(u) = io, 7 (w), for a.e. m£R. (14) 

As V is equivalent to (14), this proofs the equivalence of III and V. The same argument can be 
applied to prove the equivalence between IV and VI, which ends the proof. 

□ 
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B Figures 




Figure B3: Spherical wedge S a ,b = {£ & § p : f < ^ (£, a) < ^, f < Z (£, b) < ^ } defined by points a and 
b in § 3 = {x 6 M 3 : ||x|| = l}. The wedge is the region of the sphere determined by the intersection of the 
subspaces that are generated by the normal planes of the vectors a and b. 
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Figure B4: The effect of the kind and length of basis expansion for functional data. From up to down and 
left to right: sample of 100 simulated trajectories from the Ornstein-Uhlenbeck process; representation in a 
B-splines basis of 50 elements; representation in a FPC basis of 5 elements; representation in a FPLS basis 
of 5 elements, using an independent scalar response distributed as a A/"(0, 1). 
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Y = <X,p,. k > + £ Y = (X,p 2 . k > + E Y = (X, p ) + S k <X.X) + E 



Figure B5: Upper row: functional coefficient deviations of the simple null hypothesis for Hi t k (left) and 
H2,k (right), k — 1,2,3. Lower row: densities of the scalar response under the null hypothesis (H ) and for 
the three deviations (iij,k, k — 1,2,3, for each model j = 1,2,3). The estimation of the densities of the 
response has been done with kernel smoothing from a sample of 1000 observations. The bandwidth is the 
same in the four densities of each model, and is computed by the method of Shcathcr and Jones (1991), for 
the case of the null hypothesis. 
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Y = <X, p,) + S k <X.X) + E 



Y = <X, fe) + 5 k <X. X) + e 



Y = <X,fe) + S k <X.X) + e 



Figure B6: Upper row: functional coefficients of the linear models for the composite hypothesis. Lower row: 
densities of the scalar response under the null hypothesis (-Hj,o, for each model j = 1,2,3) and for the three 
quadratic deviations (i?jfc, k = 1,2,3, for each model j = 1,2,3). The densities are computed in the way 
described for the simple hypothesis. 
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Tables 
Simple hypothesis 



Models 


F-test 


PCvM test 


h = Yir*\r 


Delsol et al. test 
h = 0.25 h = 0.50 h = 0.75 


h = 1.00 


11 


u.uuu 


U.U^iJ. 


ft Ofi^ 
u.uuu 


0.043 


0.055 


0.050 




TJ^ -. 
"1,1 


U.UDU 


u.uoy 


u.uyo 


0.069 


0.070 


0.069 


U.UDo 


#1,2 


0.163 


0.078 


0.309 


0.350 


0.112 


0.063 


0.057 


#1,3 


0.401 


0.138 


0.772 


0.815 


0.241 


0.087 


0.066 


H2,l 


0.248 


0.053 


0.080 


0.068 


0.078 


0.066 


0.055 


H2,2 


0.951 


0.336 


0.403 


0.318 


0.447 


0.391 


0.274 


#2,3 


1.000 


0.904 


0.877 


0.794 


0.887 


0.870 


0.775 


H3 : l 


0.034 


0.173 


0.165 


0.051 


0.096 


0.116 


0.126 


#3,2 


0.038 


0.691 


0.554 


0.209 


0.361 


0.456 


0.522 


#3,3 


0.019 


0.998 


0.932 


0.799 


0.926 


0.956 


0.976 



Table C4: Empirical power of the competing tests for the simple hypothesis H : m{X) = (X,/3q), (3o(t) = 
0, Vt and significance level a = 0.05. Noise follows a Af(0, 0.10 2 ). 



Models 


F-test 


PCvM test 


h = hcv 


Delsol et al. test 
h = 0.25 h = 0.50 h = 0.75 


h = 1.00 


Ho 


0.043 


0.051 


0.066 


0.034 


0.054 


0.057 


0.057 


#M 


0.056 


0.052 


0.072 


0.051 


0.055 


0.049 


0.052 


H\ y 2 


0.180 


0.085 


0.285 


0.333 


0.132 


0.065 


0.055 


Hl : 3 


0.442 


0.166 


0.719 


0.773 


0.260 


0.099 


0.074 


#2,1 


0.265 


0.071 


0.089 


0.052 


0.092 


0.074 


0.071 


#2,2 


0.932 


0.343 


0.420 


0.314 


0.460 


0.409 


0.306 


#2,3 


0.999 


0.901 


0.848 


0.745 


0.874 


0.856 


0.775 


#3,1 


0.052 


0.125 


0.128 


0.036 


0.066 


0.077 


0.093 


#3,2 


0.034 


0.721 


0.558 


0.136 


0.347 


0.444 


0.526 


#3,3 


0.012 


1.000 


0.967 


0.805 


0.985 


0.993 


0.994 



Table C5: Empirical power of the competing tests for the simple hypothesis H : m(X) = (X,(3 ), f3 (t) = 
0, Vt and significance level a = 0.05. Noise follows a recentred Exp(10). 
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C.2 Composite hypothesis 



Coefficient estimation 



Models 


B-splines estimation 


FPC estimation 


FPLS estimation 


a=0.10 


a=0.05 


a=0.01 


a=0.10 


a=0.05 


a=0.01 


a=0.10 


a=0.05 


a=0.01 


#1,0 


0.125 


0.061 


0.014 


0.107 


0.052 


0.011 


0.100 


0.059 


0.016 


#1,1 


0.162 


0.094 


0.025 


0.143 


0.082 


0.025 


0.153 


0.078 


0.021 


#1 ,2 


0.839 


0.747 


0.509 


0.826 


0.732 


0.487 


0.810 


0.715 


0.470 


#1,3 


1.000 


0.997 


0.986 


1.000 


0.997 


0.982 


1.000 


0.996 


0.974 


#2,0 


n 1 in 


U.UOo 


U.U1D 


n non 
u.uyu 


n n/i k 

U.U40 


U.U1Z 


n nQi 
u.uy i 


u.uou 


U.U14 


TJ „ , 
■"2,1 


0.164 


0.086 


0.021 


0.149 


0.071 


0.020 


0.154 


0.074 


0.020 


#2,2 


0.844 


0.745 


0.513 


0.817 


0.722 


0.491 


0.813 


0.720 


0.489 


#2,3 


0.997 


0.997 


0.983 


0.999 


0.996 


0.985 


0.999 


0.997 


0.984 


#3,0 


0.113 


0.054 


0.009 


0.098 


0.046 


0.008 


0.101 


0.044 


0.008 


#3,1 


0.157 


0.082 


0.016 


0.153 


0.077 


0.012 


0.145 


0.075 


0.013 


#3,2 


0.853 


0.764 


0.515 


0.838 


0.752 


0.506 


0.834 


0.750 


0.478 


#3,3 


0.999 


0.999 


0.986 


0.999 


0.998 


0.987 


0.999 


0.998 


0.985 



Table C6: Empirical power of the PCvM test for the composite hypothesis H : m e {(-,/?) :/?eH} and 
for three estimating methods of (3. Noise follows a Af(0, 0.10 2 ). 



Coefficient estimation 



Models 


B-splines estimation 


FPC estimation 


FPLS estimation 


a=0.10 


a=0.05 


a=0.01 


a=0.10 


a=0.05 


a=0.01 


a=0.10 


a=0.05 


a=0.01 


#1,0 


0.105 


0.039 


0.004 


0.091 


0.046 


0.005 


0.100 


0.046 


0.006 


#1,1 


0.149 


0.074 


0.020 


0.134 


0.072 


0.015 


0.146 


0.077 


0.019 


#1,2 


0.823 


0.737 


0.500 


0.813 


0.721 


0.480 


0.801 


0.720 


0.477 


#1,3 


0.998 


0.996 


0.986 


0.999 


0.997 


0.987 


0.996 


0.996 


0.983 


#2,0 


0.089 


0.041 


0.009 


0.087 


0.035 


0.009 


0.088 


0.033 


0.010 


#2,1 


0.160 


0.081 


0.020 


0.146 


0.080 


0.016 


0.133 


0.078 


0.015 


#2,2 


0.835 


0.743 


0.487 


0.811 


0.724 


0.489 


0.809 


0.718 


0.493 


#2,3 


0.995 


0.994 


0.978 


0.996 


0.995 


0.979 


0.995 


0.994 


0.978 


#3,0 


0.104 


0.052 


0.006 


0.089 


0.040 


0.005 


0.087 


0.038 


0.004 


#3,1 


0.130 


0.072 


0.017 


0.119 


0.062 


0.014 


0.110 


0.062 


0.014 


#3,2 


0.831 


0.735 


0.498 


0.833 


0.737 


0.486 


0.820 


0.721 


0.481 


#3,3 


0.999 


0.998 


0.987 


0.999 


0.998 


0.988 


0.999 


0.997 


0.984 



Table C7: Empirical power of the PCvM test for the composite hypothesis H : m e {(•,/?) :/?eH} and 
for three estimating methods of /3. Noise follows a recentred Exp(10). 
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Coefficient estimation 



Models n 


B-splines estimation 
a=0.10 a=0.05 a=0.01 


FPC estimation 
q=0.10 a=0.05 a=0.01 


FPLS estimation 
a=0.10 a=0.05 a=0.01 


#1,0 50 
100 
200 


0.159 0.076 0.015 
0.125 0.061 0.014 
0.115 0.062 0.010 


0.138 0.059 0.010 
0.107 0.052 0.011 
0.106 0.059 0.009 


0.123 0.062 0.012 
0.100 0.059 0.016 
0.106 0.058 0.010 


#i,i 50 
100 
200 


0.187 0.093 0.024 
0.162 0.094 0.025 
0.212 0.121 0.040 


0.135 0.064 0.010 
0.143 0.082 0.025 
0.202 0.123 0.033 


0.139 0.069 0.011 
0.153 0.078 0.021 
0.207 0.115 0.033 


#1,2 50 
100 
200 


0.615 0.484 0.180 
0.839 0.747 0.509 
0.982 0.966 0.892 


0.590 0.442 0.152 
0.826 0.732 0.487 
0.981 0.963 0.897 


0.551 0.414 0.158 
0.810 0.715 0.470 
0.980 0.961 0.870 


#1,3 50 
100 
200 


0.956 0.900 0.659 
1.000 0.997 0.986 
1.000 1.000 1.000 


0.948 0.893 0.655 
1.000 0.997 0.982 
1.000 1.000 1.000 


0.935 0.873 0.647 
1.000 0.996 0.974 
1.000 1.000 0.999 



Table C8: Empirical power of the PCvM test for the composite hypothesis H : m e : fi € H} and 

for different sample sizes n. Noise follows a Af(0, 0.10 2 ). 



C.3 Traces of the test for the simple and the composite hypothesis 



Models 


p = 1 


p = 2 


p = 3 


p = 4 


p = 5 


p = 6 


#o 


0.035 


0.037 


0.037 


0.036 


0.036 


0.037 


#1,2 


0.044 


0.095 


0.085 


0.079 


0.076 


0.075 


#2,2 


0.527 


0.408 


0.374 


0.362 


0.353 


0.349 


#3,2 


0.690 


0.703 


0.705 


0.708 


0.706 


0.709 



Table C9: Empirical power of the PCvM test for the simple hypothesis H : m{X) = (X, /3 ), (3o{t) — 0, Vt, 
for different numbers p of FPC considered in the representation of the functional process. The significance 
level is a = 0.05 and noise follows a A/"(0, 0.10 2 ). 



Models 


p = 1 


p = 2 


p = 3 


p = 4 


p = 5 


p = 6 


#1,0 


0.063 


0.051 


0.049 


0.052 


0.061 


0.062 


#1,1 


0.056 


0.062 


0.079 


0.078 


0.085 


0.089 


#1,2 


0.191 


0.409 


0.686 


0.741 


0.754 


0.753 


#1,3 


0.585 


0.908 


0.996 


0.997 


0.997 


0.997 



Table C10: Empirical power of the PCvM test for the composite hypothesis Ho : m e {(-,/?) : (3 € M}, for 
different numbers p of FPC considered in the representation of the functional process. The significance level 
is a = 0.05 and noise is a Af(0, 0.10 2 ). 
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